-
Notifications
You must be signed in to change notification settings - Fork 524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PT: keep the same checkpoint behavior as TF #3191
Conversation
Set the default save_ckpt to `model.ckpt` as the prefix. When saving checkpoints, `model.ckpt-100.pt` will be saved, and `model.ckpt.pt` will be symlinked to `model.ckpt-100.pt`. A `checkpoint` file will be saved to record `model.ckpt-100.pt`. This keeps the same behavior as the TF backend. Signed-off-by: Jinzhe Zeng <[email protected]>
Signed-off-by: Jinzhe Zeng <[email protected]>
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## devel #3191 +/- ##
==========================================
- Coverage 74.27% 74.27% -0.01%
==========================================
Files 343 343
Lines 31629 31634 +5
Branches 1592 1592
==========================================
+ Hits 23494 23497 +3
- Misses 7210 7212 +2
Partials 925 925 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Jinzhe Zeng <[email protected]>
hi @njzjz Can I know why you need different file extension The files
and the file
can we just use one of these ext for convenient when collect files in dpegen? |
No control flow is saved in the checkpoint file. |
Set the default
save_ckpt
tomodel.ckpt
as the prefix. When saving checkpoints,model.ckpt-100.pt
will be saved, andmodel.ckpt.pt
will be symlinked tomodel.ckpt-100.pt
. Acheckpoint
file will be dedicated to recordmodel.ckpt-100.pt
.This keeps the same behavior as the TF backend. One can do the below using the PT backend just like the TF backend:
dp --pt train input.json # one can cancel the training before it finishes dp --pt freeze